Home | Miscellaneous | Raspberry Pi | Canonical
Charles Varvayanis Logo
Charles Varvayanis
Computer and Communication Systems
Since 1990
(209) 586-3782
charles@varvayanis.com
Charles Varvayanis Logo

Raspberry Pi PDF, TXT & etc. Canonical Headers

Raspberry Pi step-by-step instructions for adding Automatic Canonical Header Generation for PDF, TXT and/or other file types with an Apache 2 Web Server.

These procedures apply to Raspberry Pi 5, 4 or 3 with Raspberry Pi OS (64-Bit), (32-Bit) or (Legacy, 32-Bit) running Apache 2 with or without Let's Encrypt Certificates and Certbot.


General Notes


1. General:  The procedures below are optimized for adding Automatic Canonical Header Generation for PDF, TXT and/or other file types to a Raspberry Pi 5, 4 or 3 with Raspberry Pi OS (64-Bit), (32-Bit) or (Legacy, 32-Bit) running an Apache 2 Web Server with or without HTTPS using Let's Encrypt Certificates and Certbot.

2. Automatic Canonical Header Generation for PDF, TXT and/or Other File Types:  Canonical information is used by search engines when building indexes while deciding which pages to include in search results.  Unlike HTML and similar file types, PDF, TXT and other file types do not have provisions for delivering canonical information.  The steps below are for implementing automatic canonical header generation for PDF, TXT and/or other file types being served by the Web Server.  Alternatively, Web Servers can be configured on a file by file file basis to deliver canonical file information in headers, but is not discussed further here.

3. Internet access during setup:  Many of the steps below assume and require the target Raspberry Pi is connected to a network with access to the Internet.



Notice about updates, upgrades and installations failing due to repository or network congestion or outages


Occasionally updates, upgrades and installations fail due to repository or network congestion or outages.  Sometimes there is an appropriate message saying as such, sometimes a missing file is reported, and sometimes there is just a failure message without an explanation.  When this occurs, simply run the command again.  If that does not solve the issues immediately, try again later.



Raspberry Pi OS Documentation

https://www.raspberrypi.com/documentation/computers/os.html



Connect to the target Raspberry Pi


Via Raspberry Pi Connect Remote shell or Raspberry Pi Connect Screen share then open a Terminal window.

https://www.raspberrypi.com/software/connect

  - or -

Via a Display, Keyboard and Mouse, then open a Terminal window.


  - or -

Via SSH


Determine the target Raspberry Pi IP Address:


Via Raspberry Pi Connect Remote shell or Raspberry Pi Connect Screen share then open a Terminal window.

https://www.raspberrypi.com/software/connect
sudo hostname -I
  - or -

Connect directly to the target Raspberry Pi via a Display, Keyboard and Mouse, then open a Terminal window.

sudo hostname -I
  - or -

Use an IP Scanner tool such as Advanced IP Scanner on a PC or alike to locate the DHCP IP Address assigned to the Raspberry Pi.

https://www.advanced-ip-scanner.com
  - or -

Login to your router and examine the DHCP assignments, sometimes labeled "Connected Devices" or similar.



Use SSH via a tool such as PuTTY to connect to the Raspberry Pi.

https://putty.software/
https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
https://www.putty.org
Connect using the IP address determined above or URL of the target Raspberry Pi.
Note:  The first time a connection is made, a security warning may be displayed | Yes



Automatic Canonical Header Generation for PDF, TXT and/or other file types


Notes:


See "General Notes" 2. near the top of this document.

The procedure below is for implementing automatic canonical header generation for PDF, TXT and/or other files types being served by HTTPS.  The procedure can be easily modified for any desired file types to have automatic canonical header generation and for automatic canonical header generation occurring for HTTPS and/or HTTP.

To configure automatic canonical header generation for files being served by HTTPS, Let's Encrypt Certificates and Certbot must already be setup and configured.  Automatic canonical header generation for files being served by HTTP can be configured with or without Let's Encrypt Certificates and Certbot being setup and/or configured.  When Let's Encrypt Certificates and Certbot are setup and configured for a website, HTTP traffic is automatically redirected (301) to HTTPS for that website by default, therefore only the HTTPS portion of the automatic canonical header generation requires configuration for that website unless the automatic redirection is manually disabled (removed from or commented out in the website's HTTP Virtual Host).  If Let's Encrypt Certificates and Certbot are not setup and configured for for a website, then the only the HTTP portion of the automatic canonical header generation requires configuration for that website.


Update Raspberry Pi OS and Components


Download the latest package lists

sudo apt update -y

Download and install the updated packages listed in the package lists

sudo apt full-upgrade -y


Enable the Headers Module mod_headers

sudo a2enmod headers

Enable the Rewrite Module mod_rewrite

sudo a2enmod rewrite

Reload the Apache 2 Web Server

sudo systemctl reload apache2




Reconfigure the HTTPS Virtual Hosts (Web Servers) ONLY when Let's Encrypt Certificates and Certbot are setup and configured for websites

Apache 2 supports one or more Virtual Hosts on a single machine.  In the examples below, two (2) Virtual Hosts are being reconfigured:
  exampledomain1.com
  exampledomain2.com

Note:

In the examples below:
  Replace exampledomain1.com and exampledomain2.com with your URLs.
  Adjust (pdf|txt) as apropriate.  Examples:  (pdf) or (pdf|txt|xlsx).


Disable the HTTPS Virtual Hosts to be reconfigured

sudo a2ensite exampledomain1.com-le-ssl.conf
sudo a2ensite exampledomain2.com-le-ssl.conf


Edit each of HTTPS Virtual Host Files using either Mousepad in the Raspberry Pi GUI or nano via SSH or Terminal


Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)

sudo mousepad /etc/apache2/sites-available/exampledomain1.com-le-ssl.conf
  - or -

Launch nano via SSH or Terminal

sudo nano /etc/apache2/sites-available/exampledomain1.com-le-ssl.conf

Add these seven lines just above </VirtualHost> near the bottom of the file:
  # Automatically set canonical headers for all PDF and TXT files
  # Enable rewriting
  RewriteEngine On
  # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash)
  RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC]
  # Add canonical header using mod_headers
  Header set Link "<https://www.exampledomain1.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH

Save and close mousepad

File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
  - or -

Save and close nano

Press CTRL + X and then press y and ENTER to save changes


Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)

sudo mousepad /etc/apache2/sites-available/exampledomain2.com-le-ssl.conf
  - or -

Launch nano via SSH or Terminal

sudo nano /etc/apache2/sites-available/exampledomain2.com-le-ssl.conf

Add these seven lines just above </VirtualHost> near the bottom of the file:
  # Automatically set canonical headers for all PDF and TXT files
  # Enable rewriting
  RewriteEngine On
  # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash)
  RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC]
  # Add canonical header using mod_headers
  Header set Link "<https://www.exampledomain2.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH

Save and close mousepad

File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
  - or -

Save and close nano

Press CTRL + X and then press y and ENTER to save changes


Test the HTTPS configurations (Optional)

sudo apachectl configtest
You should now see:
Syntax OK


Enable the HTTPS Virtual Hosts

sudo a2ensite exampledomain1.com-le-ssl.conf
sudo a2ensite exampledomain2.com-le-ssl.conf

Reload the Apache 2 Web Server

sudo systemctl reload apache2

Note:

If a site needs to be edited again, disable the site before editing it using the sudo a2dissite command with the syntax noted above.  After editing the site, save the changes and enable the site again using the sudo a2ensite command with the syntax noted above, then reload Apache using the sudo systemctl reload apache2 command for it to get and begin using the new configuration.


Test HTTPS Automatic Canonical Header Generation (Optional)


curl -I https://exampledomain1.com/<ApplicableFileOnTheWebServer> - Example:  curl -I https://exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I https://www.exampledomain1.com/<ApplicableFileOnTheWebServer> - Example:  curl -I https://www.exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I https://exampledomain2.com/<ApplicableFileOnTheWebServer> - Example:  curl -I https://exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I https://www.exampledomain2.com/<ApplicableFileOnTheWebServer> - Example:  curl -I https://www.exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.




Reconfigure the HTTP Virtual Hosts (Web Servers) ONLY when HTTP to HTTPS redirection has been disabled or Let's Encrypt Certificates and Certbot are NOT setup and configured for websites

Apache 2 supports one or more Virtual Hosts on a single machine.  In the examples below, two (2) Virtual Hosts are being reconfigured:
  exampledomain1.com
  exampledomain2.com

Note:

In the examples below:
  Replace exampledomain1.com and exampledomain2.com with your URLs.
  Adjust (pdf|txt) as apropriate.  Examples:  (pdf) or (pdf|txt|xlsx).


Disable the HTTP Virtual Hosts to be reconfigured

sudo a2ensite exampledomain1.com.conf
sudo a2ensite exampledomain2.com.conf


Edit each of HTTP Virtual Host Files using either Mousepad in the Raspberry Pi GUI or nano via SSH or Terminal


Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)

sudo mousepad /etc/apache2/sites-available/exampledomain1.com.conf
  - or -

Launch nano via SSH or Terminal

sudo nano /etc/apache2/sites-available/exampledomain1.com.conf

Add these seven lines just above </VirtualHost> near the bottom of the file:
  # Automatically set canonical headers for all PDF and TXT files
  # Enable rewriting
  RewriteEngine On
  # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash)
  RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC]
  # Add canonical header using mod_headers
  Header set Link "<http://www.exampledomain1.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH

Save and close mousepad

File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
  - or -

Save and close nano

Press CTRL + X and then press y and ENTER to save changes


Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)

sudo mousepad /etc/apache2/sites-available/exampledomain2.com.conf
  - or -

Launch nano via SSH or Terminal

sudo nano /etc/apache2/sites-available/exampledomain2.com.conf

Add these seven lines just above </VirtualHost> near the bottom of the file:
  # Automatically set canonical headers for all PDF and TXTfiles
  # Enable rewriting
  RewriteEngine On
  # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash)
  RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC]
  # Add canonical header using mod_headers
  Header set Link "<http://www.exampledomain2.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH

Save and close mousepad

File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
  - or -

Save and close nano

Press CTRL + X and then press y and ENTER to save changes


Test the HTTP configurations (Optional)

sudo apachectl configtest
You should now see:
Syntax OK


Enable the HTTP Virtual Hosts

sudo a2ensite exampledomain1.com.conf
sudo a2ensite exampledomain2.com.conf

Reload the Apache 2 Web Server

sudo systemctl reload apache2

Note:

If a site needs to be edited again, disable the site before editing it using the sudo a2dissite command with the syntax noted above.  After editing the site, save the changes and enable the site again using the sudo a2ensite command with the syntax noted above, then reload Apache using the sudo systemctl reload apache2 command for it to get and begin using the new configuration.


Test HTTP Automatic Canonical Header Generation (Optional)


curl -I http://exampledomain1.com/<ApplicableFileOnTheWebServer> - Example:  curl -I http://exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I http://www.exampledomain1.com/<ApplicableFileOnTheWebServer> - Example:  curl -I http://www.exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I http://exampledomain2.com/<ApplicableFileOnTheWebServer> - Example:  curl -I http://exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.

curl -I http://www.exampledomain2.com/<ApplicableFileOnTheWebServer> - Example:  curl -I http://www.exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.




Remove packages that were automatically installed and are no longer required

Occasionally excess update, upgrade and installation packages install automatically, but are no longer required.  These can be removed automatically.

Automatically detect and remove packages no longer required

sudo apt autoremove -y



Charles Varvayanis
Sonora, CA  95370
e-mail:  charles@varvayanis.com
Phone:  (209) 586-3782
Fax:  (209) 586-3761
Business Card (PDF 153 KB) PDF
www.varvayanis.com
www.varvayanis.com

© 2026 Charles Varvayanis.  All rights reserved.